Search for the standard model Higgs boson decaying to a bb pair in events with two 
oppositely-charged leptons using the full CDF data set 
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In the standard model of particle physics (SM) [l[, elec- 
troweak symmetry breaking Q generates a fundamental 
scalar boson known as the Higgs boson. Although there 
is strong evidence of electroweak symmetry breaking, the 
Higgs boson has yet to be observed. The SM does not 
predict the mass of the Higgs boson, m# , but the combi- 
nation of precision electroweak measurements Q , includ- 
ing recent top quark and W boson mass measurements 
from the Tevatron constrains ran < 152 GeV/c 2 at 

the 95% confidence level. Direct searches at LEP2 [fj, the 
Tevatron Q , and the LHC Q exclude all possible masses 
of the SM Higgs boson at the 95% confidence level or 
the 95% credibility level (C.L.), except within the ranges 
116.6 - 119.4 GeV/c 2 and 122.1 - 127 GeV/c 2 . A SM 
Higgs boson in these mass ranges would be produced in 
the y/s = 1.96 TeV pp collisions of the Tevatron, and 
have a branching fraction to bb greater than 50% [9l-lll | . 
While the most sensitive searches for the SM Higgs bo- 
son at the LHC are those based on Higgs boson decays 



Maria, llOv Valparaiso, Chile, dti Yarmouk University, Irbid 211-63, 
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to pairs of gauge bosons, the results presented here are 
currently the most sensitive for a SM Higgs boson decay- 
ing to a pair of b quarks. The searches at the LHC in the 
four-lepton and diphoton final state offer precise mea- 
surements of the mass of the Higgs boson, while the re- 
sults presented here provide information about the Higgs 
boson's couplings to fermions and are therefore comple- 
mentary to the primary LHC search modes. In searches 
for the production of a Higgs boson in association with a 
vector boson ( WH or ZH ) , leptonic decays of the vec- 
tor boson provide effective discrimination between the 
expected signal and the large, uncertain hadronic back- 
grounds. Searches for pp —> Z(—> £ + £~)H{-^ bb) (£ — 
electron or muon 12J) are among the most sensitive of the 
Tevatron low-mass Higgs boson searches, benefiting from 
low background rates and the ability to fully reconstruct 
both Z and Higgs boson resonances. Previous searches 
in this final state have been reported by the LEP2, DO, 
CDF, CMS, and ATLAS collaborations jf! EMi ■ 

In this Letter, we present an updated search for ZH — > 
£ + £~bb events in which we expand upon the techniques 
of the previous CDF search and analyze data corre- 
sponding to more than twice the integrated luminosity 
used therein fLU ]. This search introduces new multivari- 
ate 6-jet and lepton identification techniques and up- 
dated multi-stage artificial neural network (NN) back- 
ground discrimination. This results in up to a 65% im- 
provement in sensitivity to a Higgs boson signal com- 
pared to the methods used in our previous search 14 1. 
Due to the larger data set, improved 6-jet identification 
techniques that differ significantly from previously used 
methods, and expanded online event selection, 85% of 
ZH — > £ + £~bb candidate events identified in this search 
were not present in the search sample used in the previ- 
ous analysis [Til ]. 

The data were collected by the upgraded CDF II de- 
tector, correspond to 9.45 fb _1 of Tevatron pp colli- 
sions at y / s=1.96 TeV, and constitute the final CDF 
II data set. The CDF II detector is described in de- 
Chargcd-particlc trajectory (track) 
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tail elsewhere 

reconstruction and momentum determination capabili- 
ties are provided by silicon-based tracking systems sur- 
rounded b y a drift chamber immersed in a 1.4 T mag- 
netic field 3 li| . The tracking systems are surrounded 
by calorimeters that provide coverage for < 3.6 [20I — 
22 1 . Jets are identified using a cone algorithm [23| that 



combines calorimeter energy deposits to form jets with a 
radius of 0.4 in rj-<fi space. External to the calorimeters, 
an additional system of drift chambers and scintillation 
counters provides muon detection for 



24] 



\V\ < 1-5 

CDF II records only those collision events that meet 
the criteria of an online event selection (trigger) system. 
To maximize signal acceptance we trigger inclusively on 
the properties of the candidate events, using data selected 
by three sets of trigger algorithms 25, 2(|. The first set 



or two electron candidates. The electron candidates are 
required to have a minimum transverse energy (Ex) of 
8 to 18 GeV, depending on the specific algorithm. The 
second set of trigger algorithms requires the presence of a 
muon candidate with a minimum transverse momentum 
(pr) of 18 to 22 GeV/c, again depending on the specific 
algorithm. Because muons deposit only a small fraction 
of their momentum in the calorimeter, we gain additional 
online efficiency by using a third set of algorithms that 
accept events with significant missing calorimeter trans- 
verse energy 27J , generally above 30 GeV. Several of the 
algorithms in this set impose additional requirements on 
the number (typically two) and transverse energy (gen- 
erally greater than 10 GeV) of jets in the event. The 
combined triggers have a selection efficiency of approx- 
imately 90% (100%) for events within the acceptance 
of the CDF II detector containing two energetic muons 
(electrons) and two or more jets. 

Additional offline requirements are imposed on the 
events selected by the trigger algorithms. Several re- 
quirements are applied to select events consistent with 
the decay of a Z boson to either pairs of electron or 
pairs of muons. Electrons and muons are selected by 
new NN-based algorithms optimized for efficient lepton 



consists of algorithms that require the presence of one 



identification [251 1261 ] . The NN algorithms combine muon 
detector, tracking, and calorimeter information, allowing 
for a 20% increase in Z — > £ + £~ acceptance compared 
to the selections in Ref. [3]. We reject lepton candi- 
dates with pt < 10 GeV/c and require that the lepton 
candidate pairs have opposite electric charge when they 
are muons, or are electrons satisfying |?7| < 1.1 for each 
electron [28| . Events in which the reconstructed Z boson 
has a mass of less than 76 GeV/c 2 or greater than 106 
GeV/c 2 are rejected. In addition to a Z — >• £ + £~ can- 
didate, we require the presence of a candidate H — >• bb 
decay, selecting events with exactly two or three jets with 
|^| < 2.0 and an Et > 25 GeV. Jet energies include 
corrections for local variations in calorimeter response, 
the energy contribution from additional pp interactions, 
and corrections specific to this analysis that assume that 
net missing transverse energy (|?t) [13] aris es p redomi- 
nantly from the mismeasurement of jets 3, 23[ . Events 
in which the combined mass of the two most energetic 
jets is less than 25 GeV/c 2 are removed. The resulting 
fractional resolution of the invariant mass of pairs of jets 
is estimated to be 11% fl4| . 

Further event selection requires that at at least one jet 
in the event, referred to as a b-tagged jet. be identified as 
consistent with the fragmentation of a b quark. The data 
sample that satisfies all event selection criteria apart from 
the requirement of 6-tagged jets is referred to as the Pre- 
Tag sample. We perform the analysis on a subset of the 
PreTag sample that consists of events with at least one b- 
tagged jet. We employ a new multivariate 6-tagging algo- 
rithm specifically designed to increase the 6-tag efficiency 
and reduce the contamination of incorrectly tagged q jets 
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(q=u,s,d,g) in CDF H — > bb searches [29[. For each jet 
containing at least one charged-particle track, the algo- 
rithm produces a scalar value in the range -1 to 1. By 
comparing this value to two predetermined thresholds, 
the jet is classified as not tagged, loose tagged (L), or 
tight tagged (T), with all tight-tagged jets also satisfying 
the loose-tag definition. The thresholds defining these 
categories are chosen to optimize the combined expected 
exclusion sensitivity in simulated events. The definition 
of T (L) results in a per-jet tag rate of 42% (70%) for 
jets containing the fragmentation of a b quark, 9% (27%) 
for jets containing the fragmentation of a charm quark 
and no b quark, and 0.89% (8.9%) for jets without the 
fragmentation of a b or charm quark. 

We form four categories of events with 6-tagged jets. 
Events with two or more jets with tight b tags constitute 
the double-tight (TT) category. Events with one jet with 
a tight b tag and one or more jets with a loose b tag form 
the tight+loose (TL) category. Those with one jet with 
a tight b tag, and no other tight or loose 6-tagged jet 
make up the single tight (Tx) category. Events with two 
or more jets with loose b tags comprise the double-loose 
(LL) category. If a data event satisfies more than one tag 
category, then the category of highest expected signal-to- 
background ratio is chosen, ranked TT, TL, Tx, and LL 
in decreasing order. The 6-tagging algorithm employed 
in this search improves sensitivity to a ZH signal by ap- 
proximately 15% compared to the strategy used in our 
previous Letter (l4l |. 

The four 6-tag categories are subject to different sys- 
tematic uncertainties, background compositions, and 
predicted ZH content, and are therefore maintained as 
separate analysis channels. We further divide events by 
the Z boson decay (Z — > e + e~ or Z — > fi + fi~), and again 
by the number of jets in the event (two or three). In to- 
tal we form 16 exclusive channels that arc simultaneously 
examined for ZH content and jointly used to set upper 
limits on ozh x B(H — > bb). In simulated signal events 
we find a total selection efficiency of approximately 24%. 

Background processes that produce two leptons and 
two or three jets in the final state may satisfy the above 
selection criteria. Among these, the dominant back- 
ground is Z+jets production, nearly saturated by Z + qq 
before 6-tag requirements are imposed. After b tag- 
ging, Z + bb and Z + cc are the most significant back- 
grounds. Z+jets events arc modeled using ALPGEN [3(J 
with PYTHIA [3l| for particle showering and hadroniza- 
tion. Simulated Z+jets samples are normalized to match 
experimental measurements [32| of the Z+jets produc- 
tion rate. As reported in Refs. [H,|33|, ALPGEN under- 
estimates the fraction of Z+heavy-flavor (b and c) jet 
events in inclusive Z+jets production. To compensate, 
we increase the normalization of Z + bb and Z + cc sam- 
ples by a factor of 1.4 relative to the normalization of 
Z + qq samples. 

Signal, tt, and diboson (VFI4 7 , WZ, ZZ) processes are 



modeled with pythia. The production rate of ZH and 
the Higgs boson branching ratios are set to the values in 
Refs. [9j. The tt simulation assumes a top-quark mass 
of 172.5 GeV/c 2 and is normalized to a production rate 
of 7.04 pb [35j. Diboson contributions are normalized to 
next-to- leading-order cross sections [36| . Each simulated 
sample includes a detailed GEANT-based detector simula- 
tion [37J and uses the CTEQ5L [38| parton distribution 
functions. 

We account for the contributions from QCD multijet 
and H^+jets processes using a data-derived model for 
misidentified Z — > £ + £~ candidates. An electron and a 
jet have a small (< 10~ 3 ) likelihood of being misiden- 
tified as two electrons. We model such misidentified 
Z — > e + e _ candidates using events containing a sin- 
gle electron and several jets. Each electron-jet pair in 
these events contributes to the model of misidentified 
Z — > e + e~ weighted by a factor reflecting the probability 
of the jet to be misidentified as an electron. The deter- 
mination of the weights is described in Ref. (25|. The 
misidentified Z — > fi + \i~ contribution is modeled using 
like-sign muon pairs identified in the PreTag data j2(| . 

We apply several corrections that affect the normaliza- 
tion of simulated samples. We correct the instantaneous 
luminosity profile of the simulated samples to match that 
observed in data. We correct the energy of lepton can- 
didates to ensure agreement between the energy distri- 
butions in measured and simulated events, with correc- 
tions being approximately 1% of the uncorrected value. 
In addition, we apply corrections for differences in lep- 
ton and b jet reconstruction and selection efficiencies in 
data and simulated samples. To account for the selec- 
tion efficiency of the CDF II trigger system, we employ 



multivariate trigger emulation |25|, [26[ . For each of the 



three sets of triggers detailed above, a NN is trained on 
data events to describe the likelihood that the trigger sys- 
tem will select the event. The training data is selected 
via triggers independent to the set which each seeks to 
describe, using the same event kinematic information as 
the trigger system. The output of each NN is applied 
to each simulated event as a normalization factor, to re- 
flect the per-event, kinematics-dependent probability of 
online selection as observed in data. Combining all back- 
ground processes, we expect a total PreTag background 
of 19 000 ± 4 000 events, events, in good agreement with 
the observed total of 19 302. Event totals for observed 
data and expectations in the ^-tagged sample are also in 
good agreement, with the background composition and 
totals listed for each 6-tag category separately in Table HI 
To separate a possible Higgs boson signal from back- 
ground, we employ a method that utilizes NN discrim- 
inants. The multi-stage discriminant method enhances 
the isolation of simulated signal from background by 
combining a series of expert NN's with a master network. 
The master network is constructed to isolate the ZH sig- 
nal from all backgrounds simultaneously, while each ex- 
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Process 


TT 


TL 




Tx 


LL 


ti 


55 ± 8.3 


60 ± 


8.5 


90 ± 12 


17 ± 2.5 


Diboson 


10 ± 1.5 


14 ± 


1.9 


40 ± 4.0 


8.7 ± 1.0 


Z + bb 


59 ± 25 


83 ± 


35 


239 ± 101 


32 ± 14 


Z + cc 


3.9 ± 1.7 


19 ± 


8.4 


109 ± 47 


24 ± 11 


Z + qq 


1.0 ± 0.4 


14 ± 


3.5 


192 ± 44 


55 ± 14 


Misid. Z 


2.1 ± 1.0 


15 ± 


7.6 


31 ± 15.4 


10 ± 5.1 


ZH (predicted) 


1.9 ± 0.3 


2.0 ± 


0.3 


2.8 ± 0.4 


0.5 ± 0.1 


Total bkg. 


131 ± 26 


205 ± 


: 38 


701 ± 122 


147 ± 23 


Data 


117 


199 


730 


165 



TABLE I: Comparison of the expected event totals for back- 
ground and ZH signal with the observed number of data 
events. Event totals are displayed grouped by b-tag category 
(TT, TL, Tx, LL). The ZH totals assume m H = 125 GeV/c 2 . 
The displayed uncertainties are systematic. Statistical un- 
certainties are negligible for all model components except 
misidentified Z, for which they are comparable to the sys- 
tematic uncertainty. 

pert network is optimized for discrimination against a 
single background component. Each NN is trained us- 
ing simulated events meeting PreTag selection require- 
ments. A tt expert network separates ZH from ti, a sec- 
ond Z+jets expert network separates signal from Z + qq 
and Z + cc, and a third diboson expert separates ZH 
from diboson processes. No network specifically opti- 
mized for discriminating misidentified Z events is used, 
because they are observed to be well separated from ZH 
events using only the ti expert, due to their characteris- 
tically large values of J^t- 

The final analysis is performed using the distribution of 
the master network scores for observed events in a binned 
final discriminant (BFD). A master network is optimized 
for 13 m H -hypothcscs (90 to 150 GcV/c 2 in 5 GcV/c 2 
unit increments), with separate networks for two- and 
three-jet events. Each master NN is constructed to return 
a score between and 0.25 for each event, while each 
expert returns a value between and 1, with being most 
background-like in all cases. The BFD has four regions 
(I, II, III, IV) each with a varying signal expectation 
and background composition. Events are sorted into one 
of the regions based on the output of the three expert 
networks. If the ti expert returns a value of less than 0.5 
(it-like), the event is assigned to region I. Otherwise, if 
the expert for Z + qq and Z + cc returns a score of less 
than 0.5 (Z + qq/Z + cc-likc), the event is assigned to 
region II. Remaining events for which the diboson expert 
returns a value of less than 0.5 (diboson-like) are assigned 
to region III, with the remaining events being assigned 
to region IV. 

The BFD is formed from the distribution of the mas- 
ter NN outputs plus an offset factor. Offset factors of 0, 

0. 25, 0.5, and 0.75 are set for events assigned to regions 

1, II, III, and, IV, respectively. The output of the BFD is 



shown in Fig. QJa) for Tx events and for the sum of TT, 
TL, and LL in FigQJb) . Histogram bins containing the 
highest expected ratio of signal-to-background in each re- 
gion are those corresponding to higher BFD values, and 
the region of highest expected signal-to-background on 
average is region IV. The multi-stage discriminant tech- 
nique enhances sensitivity to a Higgs boson signal by 
approximately 10% com par ed to the discriminant tech- 
niques employed in Ref. [14| . 

We investigate the effect of several sources of system- 
atic uncertainty on the search by propagating these un- 
certainties into the BFD distribution of the background 
and signal models. The uncertainty on the measured jet 
energy scale (JES) is observed to significantly affect both 
the rate and shape of the BFD distribution. BFD shapes 
generated by varying the JES by one standard deviation 
prior to event selection and reconstruction are used in the 
search for all simulated samples. Other systematic un- 
certainties are found to have a negligible impact on the 
shape of the BFD distribution and therefore are included 
as uncertainties affecting process rates. Uncertainty in 
the normalization of each simulated sample arises due 
to uncertainty in the integrated luminosity (6%), trig- 
ger efficiency (1-5%), the lepton energy scale (1.5%), the 
amount of initial or final state radiation (1-15%), 6-tag 
algorithm efficiencies and g-jet tag probability (5-20%), 
and the JES (5-15%). The JES and 6-tag algorithm un- 
certainties dominate. 

A 50% uncertainty affects the normalization of the 
misidentified Z — > prediction, uncorrelated between 
electron and muon samples. Uncertainties of 10% [35j |. 
6% [2f|, 40%, and 40% are assumed for the normalization 
of top, diboson, Z + bb, and Z + cc backgrounds, respec- 
tively. We assign a 5% uncertainty on the normalization 
of ZH signal samples, and account for uncertainties on 
the value of B(H — > bb) [39]. In total, systematic uncer- 
tainties degrade sensitivity to a ZH signal by approxi- 
mately 13%. 

We extract upper limits on the value of <jzh x B(H — > 
bb) production rate using a Bayesian likelihood 
formed as a product of likelihoods over bins of the BFD 
distribution for all 6- tagged candidates. We assume a 
uniform prior on the signal rate, and Gaussian priors for 
each systematic uncertainty, truncated so that no pre- 
diction is negative. We set Bayesian 95% C.L. upper 
limits on <tzh x B(H -+ bb) for each m# hypothesis. Ex- 
pected upper limits are derived by randomly generating 
a series of statistical trials, derived from the background 
prediction and systematic uncertainties, and computing 
the median of the distribution of resulting upper limits. 
The upper limits on azn x B(H — > bb) are displayed in 
Fig. Hand Table HI 

We observe a broad excess for m# > 110 GeV/c 2 
peaking at 135 GeV/c 2 with local significance of 2.4 
standard deviations. Taking the limited win resolution 
of our BFD we account for a look-elscwhcrc effect of 
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FIG. 1: Distribution of the BFD output for all candidates 
meeting Tx or LL (a) and TT or TL (b) selections, compared 
to the sum of the expectation from background. A variable 
bin width is used to maintain sufficient statistics in simulated 
samples. The labels (I, II, III, IV) and vertical solid lines 
indicate the regions defined by the multi-stage discriminant 
method. 
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FIG. 2: Expected (dashed curve) and observed (solid line) 
ZH cross section times branching fraction 95% C.L. upper 
limits divided by the SM prediction are shown as a function 
of the Higgs boson mass. The dark (light) band represents 
the ±lcr (±2<j ) expected limit range. 



two, yielding a global significance of 2.1 standard devia- 
tions [jdllZaj. 

In conclusion, we have searched for the SM Higgs boson 
produced in association with a Z boson, followed by the 
decays Z — > £ + £~ and H — > bb. Finding no significant 
evidence for the process, we set 95% C.L. upper limits 
on the ZH production cross section times the H — > bb 
branching ratio for Higgs boson masses between 90 and 
150GcV/c 2 . For a Higgs boson mass of 125GcV/c 2 we 
observe (expect) a 95% C.L. upper limit of 7.1 (3.9) times 
the standard model prediction. Utilization of the full 
CDF II data set has improved sensitivity to a ZH sig- 
nal by 34% compared to the previously published analy- 
sis 



14j . Improved analysis methods have produced an ad- 



ditional approximately 30% enhancement in sensitivity, 
resulting in the most sensitive search for ZH — > £ + £~bb 
to date. 
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